CONCORDIA 
UNIVERSITY 


Learning Development Office 
2492 West Broadway 
Tel.: 482-0320 locals 397/695 


TEACHING Volume 12, Number 2 
AND | By: Ronald Smith 


LEARNING _— 


FINAL EXAMS 
Measuring Student Learning 


As teachers we are always monitoring our students' learning 
through their behaviour in class, their questions (or lack of 
them), and their performance on various projects and tests. 
However, the final exam is still for many professors the ultimate 
indication of how much their students have learned. The purpose 
of this newsletter is not to discuss the general merits of giving 
final exams but rather to examine the most common types of exams 
and their appropriateness as tools for measuring different types 
of learning. This newsletter is adapted from Planning 
Instruction by P.A. Cranton and C.B. Weston. Patricia Cranton 
will be doing two workshops at Concordia on test construction, 
October 25th and November lst. For more information see _ the 
notice at the end of this newsletter. 


Before examining the issue of matching the type of test to _ the 
learning objectives it is necessary to review both types of tests 
and levels of objectives. In general testing procedures can be 
divided into two categories, objectively scored tests and 
subjectively scored tests. 


Objectively-Scored Tests 


As the label implies, objectively scored tests are ones in which 
each question or item has only one right answer, and any two 
individuals who are scoring answers using a prepared answer key 
will obtain the same results. Four commonly used objectively- 


scored test formats are: Mul- 
tiple choice, true-false, 
matching and short answer. 


Multiple choice tests. The mul- 
tiple choice item consists of a 
statement or question (called 
the "stem") followed by a set of 
alternative endings to the 
statement or answers to the 
question. The wrong alterna- 
tives are often called 
"distractors" and most often 
range in number from two to 
four. Although specific guide- 
lines for writing items will be 
presented in one of the 
workshops, the items should have 
clear unambiguous stems, with 
correct answers that are not a 
Matter of opinion or controver- 
sy. The distractors should be 
short, plausible, and grammati- 
cally consistent with both each 
other and the stem. The correct 
answer should be clearly right, 
not merely "close" or "best." 


True-false tests. A true-false 
test consists of a series of 
statements which the student 
must label as true or false. 
Statements must be concise, 
straightforward and clearly 
statements of fact, without 
qualifiers, or stated opinions 
or attitudes (unless attributed 
to a writer, e.g. "Skinner 
(1972) argued that..."). Since 
students' guessing of the answer 
is obviously a disadvantage of 
using this test format, some 
variations of the traditional 
items have been developed in 
which the student is asked to 
explain why the item is’ false, 
or to underline those words or 
phrases that make the statement 
false. Some writers advocate 
the use of "right minus’ wrong" 
scoring techniques in order’ to 
discourage guessing; however 
the student anxiety created by 


this procedure may outweigh the 
influence of guessing in terms 
of the accuracy of the measure- 
ment of learning. 


Short answer tests. The objec- 
tively-scored short answer item 
consists of either (a) a ques- 
tion for which only one word or 
phrase, or a pre-specified list 
of words or phrases is’ the 
correct response, or (b) a 
statement in which the student 
is requested to fill in missing 
words or phrases. In either 
case, in order to meet the cri- 
teria of being objectively 
scoreable, the short answer must 
be unambiguously correct or in- 
Correct. 


Subjectively-Scored Tests 


Subjectively-scored tests in- 
clude all evaluation techniques 
which rely on the judgment of 
the instructor for scoring. As 
such, they include a much more 
diverse set of procedures’ than 
do objectively scored tests: 
Verbal Or written student 
responses, products such as 
paintings, crafts, or archi- 
tectural blueprints, performan- 
ces such as interviewing skill, 
playing squash, or laboratory 
experimentation. It is often 
feasible and always desirable, 
to make these evaluation proce- 


dures as "objective" as 
possible, however some element 
of judgment does remain. 


Subjectively-scored tests are 
used to assess learning in the 
higher levels of the cognitive 
domain (application, analysis, 
synthesis and evaluation). The 
types of tests will be divided 
into: Essay tests, oral tests, 
checklists, and rating scales. 


Essay tests. The essay test 
consists of one or more ques- 


tions or topics to which the 
student responds in writing. 
The essay test can range from a 
short, or "restricted" format 
(requesting a paragraph, an out- 
line, or a few pages) to a 
longer or "unrestricted" format 
yielding a term paper of 50 or 
more pages. The essay test may 
be given in the classroom con- 
text (with or without the use of 
resource materials), or it may 
be done outside the classroom, 
over a period of time, and 
utilizing a variety of mate- 
rials. 


Oral tests. The oral test con- 
sists of series of structured or 
semi-structured questions, or a 
single question or topic to 
which the student responds ver- 
bally. Responses may be tape- 
recorded to facilitate scoring, 
and more than one examiner may 
be present. The most familiar 
oral tests include graduate 
studies comprehensive examina- 
tions of theses' defenses and 
assessments of second language 
learning. However, oral 
assessment may also take place 
in the form of interviews, dis- 
cussions, or student presenta- 
tions. Whenever evaluation of 
student learning is based on the 
verbal response of students, 
this can be call an oral test. 


Checklists. Up to this point, 
the evaluation techniques 
presented have been appropriate 
for the cognitive and affective 
domains. Evaluation of student 
learning in the psychomotor do- 
main generally involves observa- 
tion of performance. Observa- 
tion, in itself, does not yield 
evaluation results; the obser- 
vation should be recorded in an 
appropriate way. Checklists are 
used when the instructor in- 
tends to record the occurrence 


or the frequency of occurrences 
of behaviors or characteristics 
which are checked off as they 
Occur. There is no opportunity 
for ratings or judgments of 
quality, only whether or not the 
item was observed. 


Rating scales. It is often the 
case, in the evaluation of 
psychomotor learning, that the 
instructor is concerned with how 
well the student is able to 
perform a task. When this is 
the case, a rating scale is 
commonly used. A rating scale 
consists, as does a checklist, 
of a series of behaviors or 
characteristics expressed in 
observable terms. Each item on 
the list is followed by a_ scale 
(e.g. Excellent, Good, Accept- 
able, Unacceptable) which is 
used to indicate the quality of 
the behavior or of the charac- 
teristic of the product. 


Selecting the Appropriate 
Testing Technique 


A wide variety of testing 
techniques have been described. 
In some cases the selection of a 
technique for the assessment of 
learning of an objective may 
seem straightforward, however, 
several variables influence this 
decision. The appropriateness 
of a particular test format may 
be dependent, to varying 
degrees, on: The domain and 
level of the student learning; 
practical considerations such as 
class size, facilities, time 
limitations, or certification 
requirements; and special stu- 
dent characteristics such as 
verbal ability, handicaps, age, 
or previous test experience. 


This section will discuss 
guidelines for the selection of 
the type of test; Lt is 


important, however, to remember 


that there are no absolute 
formulae and that more than one 
technique may be appropriate for 
a particular situation. 


ae a extreme examples are 
considered, common sense reveals 
that certain test formats cannot 
be used for specific types of 
learning: no one would consider 
evaluating diving performance 
with a multiple choice test, or 
students' ability to define 
terms with a rating scale. In 
many instructional situations, 
however, the decision is not 
that clear. 


Table 1 (see page 5) presents a 
review of Bloom's taxonomy of 
cognitive learning outcomes. 
(There are also tables available 
for learning outcomes in the 
affective and psychomotor do- 
mains.) 


Table 2 (see page 7) presents a 
matrix of the cognitive levels 
of learning by testing tech- 
niques. Each type of test is 
judged as: "always appropriate" 
for the level of learning (YES); 
or “can be appropriate in some 
situations" (MAYBE), or "never 
appropriate" (NO). Note that 
there are several test formats 
that are considered "always 
appropriate" for some of the 
levels of learning; for exam- 
ple, all of the objectively 
scored tests are appropriate for 
the Knowledge and comprehension 
levels. In these cases, the 
decision as to which format will 
be used would depend on instruc- 
tor preference, or perhaps on 
some of the student characteris- 
tics. 


Where a testing technique is 
described as "can be appropriate 
in some situations," the 
decision usually depends on the 
subject area, the level of the 
instruction, or perhaps on the 


instructor's Skil at item- 
writing. For example, a skilled 
item-writer, working in a 
structured subject area could 
make use of multiple choice 
items for the analysis level of 


the cognitive domain; however, 
in most cases this would not be 
appropriate. The use of 


checklists or rating scales in 
the cognitive domain would also 
be a special case, most likely 
where the cognitive learning was 
revealed through performance, 
such as in medical training or 
in a nursing program. 


Practical Considerations 


Very often, the "ideal" choice 
of test format must be modified 
due to practical considerations. 
If, for example, the instructor 
is teaching objectives mainly at 
the higher levels of the 
cognitive domain, but has a 
class size of 200, the natural 
choice of using an essay test 
has obvious’ limitations. The 
most common practical considera- 
tions include: class size, 
facilities and resources, time 
restraints, and the requirements 
of professional associations or 
certification boards. Each of 
these will be discussed briefly 
in terms of the effect that they 
have on the choice of test; 
some guidelines will be provided 
for modifying test choices. 


When the class size is large 
(over 50), the instructor is 
usually reluctant to use any 
subjectively-scored testing 
technique due to the large a- 
mount of scoring time required. 
This is a particularly difficult 
dilemma to deal with and most 
often the instructor must com- 
promise on some aspect of the 
decision: Either using less than 
desirable testing techniques, or 
devoting extensive time to 


TABLE 1 


Bloom’s Learning Outcomes 


Learning Outcomes 


Knowledge 


Comprehension 


Application 


Analysis 


Synthesis 


Evaluation 


Evidence of Outcome 


Knows common terms 

Knows specific facts 

Knows methods and procedures 
Knows basic concepts in course 
Knows principles 


Understands facts and principles 

Interprets verbal material 

Interprets charts and graphs 

Translates verbal material to mathema- 
tical formulas 

Estimates future consequences implied 
in data 

Justifies methods and procedures 


Applies concepts and principles to new 
situations 

Applies laws and theories to practical 
situations 

Solves mathematical problems 

Constructs charts and graphs 

Demonstrates correct use of a method 
or procedure 


Recognizes unstated assumptions 

Recognizes logical fallacies in 
reasoning 

Distinguishes between facts and 
inferences 

Evaluates the relevance of data 

Analyzes the organizational struc- 
ture of a work (art, music, 
writing) 

Writes a well-organized theme 

Writes a creative short story (or poem 
or piece . 

Proposes a plan for an experiment 

Integrates learning from different areas 
into a plan for solving a problem 

Formulates a new scheme for classify- 
ing objects (or events or ideas) 


Judges the logical consistency of a 
written passage 

Judges the adequacy with which con- 
clusions are supported by data 

Judges the value of a work (art, music, 
writing) by use of internal criteria 

Judges the.value of a work (art, music, 
writing) by use of external standards 
of excellence 


Terms for Measuring Outcome 
in Test Questions 


Define, describe, identify, label, 
list, match, name, outline, re- 
produce, select, state 


Convert, defend, distinguish, esti- 
mate, explain, extend, generalize, 
give examples, infer, paraphrase, 
predict, rewrite, summarize 


Change, compute, demonstrate, 
discover, manipulate, modify, 
operate, predict, prepare, pro- 
duce, relate, show, solve, use 


Break down, diagram, differentiate, 
discriminate, distinguish, identify, 
illustrate, infer, outline, point out, 
relate, select, separate, subdivide 


Categorize, combine, compile, 
compose, create, devise, design, 
explain, generate, modify, or- 
ganize, plan, rearrange, recon- 
struct, relate, reorganize, revise, 
rewrite, summarize, tell, write 


Appraise, compare, conclude, con- 
trast, criticize, describe, dis- 
criminate, explain, justify, inter- 
pret, relate, summarize, support 


Source: Adapted from Taxonomy of Educational Objectives: Handbook I: Cognitive Domain by 
Benjamin S. Bloom et al. Copyright © 1956 by Longman Inc. Reprinted by permission of Longman 


Inc., New York, 


marking. Possible suggestions 
(their relevance is dependent on 
the subject area, facilities, 
and resources) are: 


(a) Use structured, restricted 
essay items in conjunction with 
a well-planned scoring system; 


(b) Have teaching aides or 
assistants score test responses 
(again with the use of a 
detailed scoring system, and if 
possible, with some training or 
"practice") ; 


(c) Combine short answer or 
multiple choice items with a 
small number of restricted essay 
items, attempting to assess all 
of the prerequisite, lower 
levels with the objectively- 
scored items. 


This newsletter has focused on 
testing in the cognitive domain. 


TE you would like more 
information about testing in the 
affective and psychomotor 


domains please contact our 
Office. 


If you would like to have more 
information on the construction, 
scoring, and revising of various 
types of tests plan to attend 
one of our test construction 
workshops. 


Test Construction Workshops 


Objectively Scored Tests 
(Multiple choice, true-false, 
matching and short answer) 
October 25th, 1985 
9:30-12:30 
Hall 762-1 


After a brief review of the 
principles of good testing and 
the match between type of test 
and level of learning outcomes, 
this workshop will focus on: 


- guidelines for constructing 
items 


- item writing 
- item revision 


BRING SAMPLES OF YOUR TESTS 


* * * * * 


Subjectively Scored Tests 
(Essay tests, oral tests, 


checklists, and rating scales) 
November lst, 1985 
9:30-12:30 
Hall 762-1 


After a brief review of the 
principles of good testing and 
the match between type of test 
and level of learning outcomes, 
this workshop will focus on: 


- constructing the test 
- scoring procedures 


BRING SAMPLES OF YOUR TESTS 


Resource Person: 


Dr. Patricia Cranton, McGill 
University, Center for Univer- 
sity Teaching and Learning. 


TABLE 2 


APPROPRIATENESS OF TESTING TECHNIQUES IN THE 
COGNITIVE DOMAIN 


Types of Tests 
Multiple  True/ 


Short Essay Oral Rating 


heck li 2 
_ MAYBE Ct 


Levels of Domain 


Knowledge 


ES NO 
ES ae MAYBE tal 
MAYBE | MAYBE| MAYBE MAYBE 


MAYBE 


MAYBE 
MAYBE 


Comprehension 


Application 


Analysis 


Synthesis 


Evaluation 


YES = Always Appropriate 


MAYBE = Can be Appropriate in Some Situations 


= 
(=) 
' 


= Never Appropriate 


